A symmetric kernel partial least squares framework for speaker verification
نویسندگان
چکیده
I-vectors are concise representations of speaker characteristics. Recent progress in i-vectors related research has utilized their ability to capture speaker and channel variability to develop efficient automatic speaker verification (ASV) systems. Inter-speaker relationships in the i-vector space are nonlinear. Accomplishing effective speaker verification requires a good modeling of these non-linearities and can be cast as a machine learning problem. Kernel partial least squares (KPLS) can be used for discriminative training in the i-vector space. However, this framework suffers from training data imbalance and asymmetric scoring. We use “one shot similarity scoring” (OSS) to address this. The resulting ASV system (OSS-KPLS) is tested across several conditions of the NIST SRE 2010 extended core data set and compared against state-of-the-art systems: Joint Factor Analysis (JFA), Probabilistic Linear Discriminant Analysis (PLDA), and Cosine Distance Scoring (CDS) classifiers. Improvements are shown.
منابع مشابه
Kernel Partial Least Squares for Speaker Recognition
I-vectors are a concise representation of speaker characteristics. Recent advances in speaker recognition have utilized their ability to capture speaker and channel variability to develop efficient recognition engines. Inter-speaker relationships in the ivector space are non-linear. Accomplishing effective speaker recognition requires a good modeling of these non-linearities and can be cast as ...
متن کاملSpeaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least Squares
The i-vector framework is one of the most popular methods in speaker verification, and estimating a total variability space (TVS) is a key part in the i-vector framework. Current estimation methods pay less attention on the discrimination of TVS, but the discrimination is so important that it will influence the improvement of performance. So we focus on the discrimination of TVS to achieve a be...
متن کاملTitle of dissertation : SCALABLE LEARNING FOR GEOSTATISTICS AND SPEAKER RECOGNITION Balaji Vasan Srinivasan Doctor of Philosophy , 2011
Title of dissertation: SCALABLE LEARNING FOR GEOSTATISTICS AND SPEAKER RECOGNITION Balaji Vasan Srinivasan Doctor of Philosophy, 2011 Thesis directed by: Professor Ramani Duraiswami Department of Computer Science With improved data acquisition methods, the amount of data that is being collected has increased several fold. One of the objectives in data collection is to learn useful underlying pa...
متن کاملVoice conversion for non-parallel datasets using dynamic kernel partial least squares regression
Voice conversion aims at converting speech from one speaker to sound as if it was spoken by another specific speaker. The most popular voice conversion approach based on Gaussian mixture modeling tends to suffer either from model overfitting or oversmoothing. To overcome the shortcomings of the traditional approach, we recently proposed to use dynamic kernel partial least squares (DKPLS) regres...
متن کاملScalable learning for geostatistics and speaker recognition
With improved data acquisition methods, the amount of data that is being collected has increased several fold. One of the objectives in data collection is to learn useful underlying patterns. In order to work with data at this scale, the methods not only need to be effective with the underlying data, but also have to be scalable to handle larger data collections. My research focused on developi...
متن کامل